scripts for SummaryMixing SSL #9

shucongzhang · 2024-06-20T17:33:00Z

This PR provides necessary code and recipes for reproduce the results of the SummaryMixing SSL paper.

TParcollet · 2024-06-26T12:06:25Z

README.md

-
-The main branch of this repository will keep tracking the latest version of SpeechBrain available. Unfortunately the results reported in our [publication](https://arxiv.org/abs/2307.07421) and bellow in the Table were obtained with SpeechBrain v0.5 and may not be exactly reproduced with the current code. If you want the exact same results, please use our dedicated
-[branch](https://github.com/SamsungLabs/SummaryMixing/tree/speechbrain_v0.5) that contains the code compatible with SpeechBrain v0.5!
+# SummaryMixing wav2vec 2.0


We should not erase the previous Readme, it should be combined.

TParcollet · 2024-06-26T12:07:53Z

benchmarks/MP3S/LibriSpeech/LSTM/train.py

@@ -0,0 +1,344 @@
+""" SummaryMixing © 2023 by Samsung Electronics is licensed under CC BY-NC 4.0.


Why do we need to copy the train.py? Did you change something in it?

TParcollet · 2024-06-26T12:08:07Z

benchmarks/MP3S/LibriSpeech/contextnet/train.py

@@ -0,0 +1,342 @@
+""" SummaryMixing © 2023 by Samsung Electronics is licensed under CC BY-NC 4.0.


Same question.

TParcollet · 2024-06-26T12:10:12Z

recipes/Libri-Light/self-supervised-learning/wav2vec2/make_librilight_csv.py

@@ -0,0 +1,90 @@
+""" SummaryMixing © 2023 by Samsung Electronics is licensed under CC BY-NC 4.0.


We should create a PR for wav2vec 2.0 pretraining on speechbrain with standard MHSA

TParcollet · 2024-06-26T12:10:47Z

recipes/Libri-Light/self-supervised-learning/wav2vec2/train_sb_wav2vec2_mel.py

+                latents = self.modules.normalize(
+                    latents, wav_lens, epoch=current_epoch
+                ).detach()
+        elif self.hparams.frontend_type == "mel_v2":


All these if should be removed with only the good one staying.

TParcollet · 2024-06-26T12:11:10Z

recipes/Libri-Light/self-supervised-learning/wav2vec2/train_sb_wav2vec2_mel.py

+            mask_prob=hparams["mask_prob"],
+            mask_length=hparams["mask_length"],
+        )
+    elif hparams["frontend_type"] == "mel_cnn_base":


TParcollet · 2024-06-26T12:12:07Z

speechbrain/lobes/models/transformer/Branchformer.py

@@ -1,489 +0,0 @@
-""" SummaryMixing © 2023 by Samsung Electronics is licensed under CC BY-NC 4.0.


This file should not be deleted!!!!

TParcollet · 2024-06-26T12:12:22Z

recipes/LibriSpeech/ASR/transformer/hparams/branchformer_summarymixing.yaml

@@ -1,359 +0,0 @@
-# ############################################################################


This file should not be deleted!

TParcollet · 2024-06-26T12:12:43Z

speechbrain/lobes/models/transformer/Transformer.py

@@ -1,1044 +0,0 @@
-""" SummaryMixing © 2023 by Samsung Electronics is licensed under CC BY-NC 4.0.


Don't delete!

TParcollet · 2024-06-26T12:12:54Z

speechbrain/lobes/models/transformer/TransformerASR.py

@@ -1,665 +0,0 @@
-""" SummaryMixing © 2023 by Samsung Electronics is licensed under CC BY-NC 4.0.


Don't delete!

whettenr · 2024-09-11T12:29:42Z

recipes/Libri-Light/self-supervised-learning/wav2vec2/make_librilight_csv.py

Hi @shucongzhang, a few questions about this prep script.

for step 2 what do you mean by the vad script ? (im using cut_by_vad.py but there is another vad script)

also from a brief look at the lengths of the audio files i believe that you could be remove the majority of the data my limiting to only 20.2 seconds, by using the following code. Do you know how many hours are left after this?

def make_csv_for_each(subpath_1_csv_file_folder, max_length=20.2): # other code if duration_seconds > max_length: continue

just to give an estimate, I'm estimating that for the large set you will only have 100 hours of audio (instead of 51k)

Hello @whettenr , thank you for your question. The vad script I'm referring to is the "cut_by_vad.py" in the "libri-light" github repo. It will cut the books to utterances as close as possible to target_len_sec. There are some issues with our server which contains the whole libri-light, so I have tested the scripts with the small split. What I did is:

python cut_by_vad.py --input_dir libri-light/small/ --output_dir libri-light/small_20s_vad/ --target_len_sec 20

python make_librilight_csv.py small_20s_vad small_20s_vad_csv

By this, I got 356 hours of data with 12.4s/15.8s/18.1s 25th/50th/75th percentile utterance length.

Can you also try the steps above for the small subset. Please let me know if the amount of data you have is different with the numbers above.

Thanks for the quick response! And I did not put --target_len_sec 20. That could defiantly make a huge difference. I think that is why for me the VAD was cutting files into lengths around 50 and 60 seconds. I will try with the small and let you know.

@shucongzhang did it and got the same 356 hours of data with 12.4s/15.8s/18.1s 25th/50th/75t

scripts for SummaryMixing SSL

144eee0

shucongzhang requested a review from TParcollet June 20, 2024 17:33

TParcollet requested changes Jun 26, 2024

View reviewed changes

TParcollet mentioned this pull request Jun 28, 2024

Streaming summary mixing #10

Merged

3 tasks

whettenr reviewed Sep 11, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scripts for SummaryMixing SSL #9

scripts for SummaryMixing SSL #9

shucongzhang commented Jun 20, 2024

TParcollet Jun 26, 2024

TParcollet Jun 26, 2024

TParcollet Jun 26, 2024

TParcollet Jun 26, 2024

TParcollet Jun 26, 2024

TParcollet Jun 26, 2024

TParcollet Jun 26, 2024

TParcollet Jun 26, 2024

TParcollet Jun 26, 2024

TParcollet Jun 26, 2024

whettenr Sep 11, 2024 •

edited

Loading

whettenr Sep 11, 2024

shucongzhang Sep 17, 2024

whettenr Sep 17, 2024

whettenr Sep 18, 2024

		@@ -0,0 +1,344 @@
		""" SummaryMixing © 2023 by Samsung Electronics is licensed under CC BY-NC 4.0.

		@@ -0,0 +1,342 @@
		""" SummaryMixing © 2023 by Samsung Electronics is licensed under CC BY-NC 4.0.

		@@ -0,0 +1,90 @@
		""" SummaryMixing © 2023 by Samsung Electronics is licensed under CC BY-NC 4.0.

		@@ -1,489 +0,0 @@
		""" SummaryMixing © 2023 by Samsung Electronics is licensed under CC BY-NC 4.0.

		@@ -1,359 +0,0 @@
		# ############################################################################

		@@ -1,1044 +0,0 @@
		""" SummaryMixing © 2023 by Samsung Electronics is licensed under CC BY-NC 4.0.

		@@ -1,665 +0,0 @@
		""" SummaryMixing © 2023 by Samsung Electronics is licensed under CC BY-NC 4.0.

scripts for SummaryMixing SSL #9

Are you sure you want to change the base?

scripts for SummaryMixing SSL #9

Conversation

shucongzhang commented Jun 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

whettenr Sep 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

whettenr Sep 11, 2024 •

edited

Loading